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Abstract: . We introduce two novel procedures to test the nullity of the slope 
function in the functional linear model with real output. The test statistics combine 
multiple testing ideas and random projections of the input data through functional 
Principal Component Analysis. Interestingly, the procedures are completely data- 
driven and do not require any prior knowledge on the smoothness of the slope nor 
on the smoothness of the covariate functions. The levels and powers against local 
alternatives are assessed in a nonasymptotic setting. This allows us to prove that 
these procedures are minimax adaptive (up to an unavoidable log log n multiplicative 
term) to the unknown regularity of the slope. As a side result, the minimax separation 
distances of the slope are derived for a large range of regularity classes. A numerical 
study illustrates these theoretical results. 
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1. Introduction 

Consider the following functional linear regression model where the scalar response Y is 
related to a square integrable random function X(.) through 



Here, w is a constant, denoting the intercept of the model, T is the domain of X(.), 
9(.) is an unknown function representing the slope function, and e is a centered random 
noise variable. In functional linear regression, much interest focuses on the nonparametric 
estimation of 9{.) in (1), given an i.i.d. sample (Xi,Yi,) 1<i<n of (X,Y). Testing whether 
9 belongs to a given finite dimensional linear subspace V is a question that arises in 
different problems such as dimension reduction, goodness-of-fit analysis, or lack-of-effect 
tests of a functional variable. If the properties of estimators of 9 are widely discussed in 
the literature, there is still a great need to have generic test procedures supported by 
strong theoretical properties. This is the problem addressed in the present paper. 

Let us reformulate the functional model (1) as a generic linear regression model in an 
infinite dimensional space. The random function X is assumed to belong to some separable 
Hilbcrt space henceforth denoted % endowed with the inner product (., .). Examples of 
H include £ 2 ([0, 1]) or Sobolev space W™([0, 1]). For the sake of clarity, we consider that 
uj = and that X and Y are centered. Thus, assuming that 9 also belongs to H, the 
statistical model (1) is rephrased as 



where e is a centered random variable independent from X with unknown variance a 2 . In 
the sequel, we note X and Y the size n vectors of i.i.d. observations Xi and Yi (1 < i < n), 
while e stands for the size n vector of the noise. 
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(1) 



Y = (X,0) +e , 



(2) 
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In essence, testing a linear hypothesis of the form "6> € V" is as difficult as testing 
u 9 = 0" when a parametric estimator of 9 in V is computed. Therefore we consider the 
problem of testing: 

H : ll 9 = 0" against H x : "6 ^ 0" 

given an i.i.d. sample (X, Y) from model (2). The extension to general subspaces V is 
developed in the discussion section. 

Most testing procedures are based on ideas that have been originally developed for the 
estimation of 9. We briefly review the main approaches and the corresponding results in 
estimation. 

A first class of procedures is based on the minimization of a least-square type criterion 
penalized by a roughness term that assesses the "plausibility" of 9. Such approaches 
include smoothing spline estimators [7, 14], thresholding projection estimators [9], or 
reproducing kernel Hilbert space methods [38]. A second class of procedures is based on 
the functional principal components analysis (PCA) of X [10, 22]. It consists in estimating 
9 in a finite dimensional space spanned by the k first eigenfunctions of the empirical 
covariance operator of X. The main difference with the previous class of estimators lies in 
the fact that the finite dimensional space is estimated from the observations of the process 
X. See the survey [11] and references therein for an overview of these two approaches. 

The theoretical properties of these classes of estimators have been investigated from 
different viewpoints: prediction [7, 10, 14, 38] (estimation of (X n+ i, 9) where X n+ i follows 
the same distribution as X), pointwise prediction [5] (estimation of (x, 9) for a fixed x € %) 
or the inverse problem [12, 22] (estimation of 9). For these three objectives, optimal rates 
of convergence have been derived and some of the aforementioned procedures have been 
shown to asymptotically achieve this rate [5, 14, 38, 22]. Recently, some non-asymptotic 
results have emerged [12, 13] for estimation procedures that rely on a prescribed basis of 
functions (e.g. splines). Most of these estimation procedures rely on tuning parameters 
whose optimal value depend on quantities such as the noise variance, or the smoothness of 
9. In fact, there is a longstanding gap in the literature between theory, where the variance 
a 2 , the smoothness of 9 and the smoothness of the covariance operator of X are generally 
assumed to be known, and practice where they are unknown. 

The literature on tests in the functional linear model is scarce. In [6], Cardot et al. 
introduced a test statistic based on the k first components of the functional PCA of X. 
Its limiting distribution is derived under H and the power of the corresponding test is 
proved to converge to one under Hi. The main drawback of the procedure is that the 
number k of components involved in the statistic has to be set. As for estimation, set- 
ting k is arguably a difficult problem. To bypass this calibration issue, one may apply a 
permutation approach [8] or use bootstrap methodologies [15, 21]. While the levels of the 
corresponding tests are asymptotically controlled, there is again no theoretical guarantee 
on the power. 

In this paper, our objective is to introduce automatic testing procedures whose powers 
are optimal from a nonasymptotic viewpoint. 

As a first step, we introduce in Section 3 Fisher- type non-adaptive tests, T a ^, corre- 
sponding to projections of Y on the k first principal components of X. We study their 
levels and powers in Sections 3 and 4. Under moment assumptions on e and mild assump- 
tions on the covariance of A, the level is smaller than a up to a log _1 (n) additional term, 
and a sharp control of the power is provided. Such results are comparable to state of 
the art results in nonparametric regression [3, 35]. In our setting, the main difficulty in 
the proof is to control the randomness of the principal components of X. The arguments 
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rely on the perturbation theory of operators. While other estimation or testing proce- 
dures based on the Karhunen-Loeve expansion have only been analyzed in an asymptotic 
setting [5, 6, 22], our nonasymptotic results rely on less restrictive assumptions on X 
than those commonly used in the literature. In Section 4, we assess the optimality of the 
parametric test T a ^ in the minimax sense. The notion of minimaxity of a level-a test T a 
is related to the separation distance of T a over some class of functions (e.g. a Sobolev 
ball) . Intuitively, the power of a reasonable test T a should be large when the norm of 9 is 
large while the power of T a is close to a when 9 is close to 0. For the problem of testing Hq: 
"0 = 0" against H± t Q-. u 9 E \ {0}", the separation distance corresponds to the smallest 
distance p such that T a rejects Hq with probability larger than 1 — f3 for all 9 g whose 
norm is larger than p. The smaller the separation distance, the more powerful the test 
T a is. The minimax separation distance over is the smallest separation distance that 
is achieved by a level-a test. A test achieving this minimax separation distance is said to 
be minimax over 0. and minimax separation distances are formalized in Section 4.2. In 
the nonparametric regression setting, minimax separation distances have been derived in 
an asymptotic [27, 28, 29] and a nonasymptotic [2] setting. In this paper, the separation 
distances of our testing procedures are nonasymptotically controlled. We derive minimax 
separation distance in the functional model (2) for a wide class of ellipsoids. We show 
that the parametric test T a ,k achieves the optimal rate of detection when the dimension 
k is suitably chosen. 

In practice, the regularity of 9 is unknown. However, the choice of k in T Q .fc depends 
on unknown quantities such as the regularity of X or the regularity of 9. Thus, assuming 
a priori that the function 9 belongs to a particular smoothness class and building an 
optimal test over may lead to poor performances, for instance if 9 ^ 0. For this reason, 
a more ambitious issue is to build a minimax adaptive testing procedure, that is a proce- 
dure which is simultaneously minimax for a wide range of regularity classes 0. Minimax 
adaptive testing procedures have already been studied in the nonparametric regression 
setting, from an asymptotic [35] and a nonasymptotic [3] viewpoint. As a second step, 
we combine the parametric tests T a ^ with multiple testing techniques in the spirit of [3]. 
Two such multiple testing procedures are introduced in Section 5. They are completely 
data-driven: no tuning parameters are required, whose optimal values depend on 9, the 
distribution of X or on a. Their levels and powers are analyzed from a nonasymptotic 
viewpoint in Sections 5 and 6. We prove that our mulitiple testing procedures are simulta- 
neously minimax over the class of ellipsoids aforementioned (up to an unavoidable log log n 
factor). As in the estimation setting [22], the minimax separation distances involve the 
common regularity of 9 and X. 

The two multiple testing procedures are illustrated and compared by simulations in 
Section 7. Extensions of the approach are discussed in Section 8. Section 9 contains the 
main proofs while the lemmas involving perturbation theory are given in Section 10. All 
the technical and side results are postponed to appendices. 

2. Preliminaries 
2. 1 . Notations 

We remind that (., .) and |.| respectively refer to the inner product and the corresponding 
norm in the Hilbert H. In contrast, (., .)„ and ||.||„ stand for the inner product and the 
Euclidean norm in M. n . Furthermore, <g> refers to the tensor product. We assume henceforth 
that X is centered and has a second moment that is E[||X|| 2 ] < oo. The covariance operator 
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of X is defined as the linear operator T defined on H as follows: 

Th = E[X ® Xh] = E[{h, X)X] , hen. 

It is well known that T is a symmetric, positive trace-class hence Hilbert-Schmidt opera- 
tor, which implies that T is diagonalizable in an orthonormal basis. We denote (Xj)j>i the 
non-increasing sequence of eigenvalues of T, while the sequence (Vj)j>i stands for a corre- 
sponding sequence of eigenfunctions. It follows that T decomposes as T = J2"jLi XjVjQVj. 
For any integer k > 1, we note Tk — Ylj=i ^jVj ® Vj the operator such that Tkh = Th 
for h £ Vect(V r 1 , ...,V k ) and T k h = if h € (Vi, . . . , Vfc)- 1 . 

In the sequel, C, Ci,. . . denote positive universal constants that may vary from line to 
line. The notation C(.) specifies the dependency on some quantities. 

2.2. Karhunen-Loeve expansion and functional PCA 

We recall here a classical tool of functional data analysis : the Karhunen-Loeve expan- 
sion, denoted KL expansion in the sequel. 

Definition 2.1. There exists an expansion of X in the basis (Vj)j>\: X = Y] (X, Vj) Vj. 
The real random variables (X, Vj) are centered (when X is centered), uncorrelated, and 
with variance Xj. As a consequence, there exists a collection {f)^)j>i of random variables 
that are centered, uncorrelated, and with unit variance such that 

•v ^Vim;. (3) 

i=i 

The decomposition is called the KL-expansion of X. 

The eigenfunction Vj is the j'-th principal direction whose amount of variance coincides 
with \j. When X is a Gaussian process, the (v f° rm an i-i-d sequence with 77W ~ 

M (0, 1). If the eigenfunctions (Vj) and the eigenvalues (A^) are unknown in practice, they 
can be estimated from the data using functional principal component analysis. In the 
sequel, we note r ra the empirical covariance operator defined by 

-1 n 1 71 

r n h = -Yx i ®x i h=-y(x u h)x i , hen. 

n L — ' n * — ' 

i=i t=i 

Functional PCA allows to estimate (Xj, Vj), j > 1 by diagonalizing the empirical covari- 
ance operator r„. These empirical counterparts of (Xj,Vj) are denoted (Xj,Vj) in the 
sequel. 

Functional PCA is usually applied as a dimension reduction technique. One of its 
appealing features relies on its ability to capture most of the variance of X by a k- 
dimensional projection on the space Vect(Vi, . . . , 14). For this reason, PCA is at the core 
of many procedures for functional data. After the seminal paper by Dauxois et al. [16], the 
convergence of the random eigenelements (Xj, Vj) has been assessed from an asymptotic 
point of view [23, 24, 25, 31]. One issue with such a dimension reduction method is 
the choice of the tuning parameter k, whose optimal value usually depends on unknown 
quantities. Besides plugging the (Xj,Vj) into linear estimates creates non-linearity and 
usually introduces stochastic dependence. 
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3. Parametric test 
3. 1 . Definition 

In the sequel, k denotes a positive integer smaller than n/2. As a first step, we consider 
the parametric testing problem of the hypotheses: 

H : "0 = 0" against H 1)k : "9 G Vect[(V5),=i,...,fc] \ {0}" . (4) 

Given a dimension k of the Karhunen-Loeve expansion, we note k as k A Rank(r„). 
In order to introduce the parametric statistic, let us restate the functional linear model 
into a finite dimensional linear model. We consider the response vector Y of size n, the 
n x k design matrix W defined by Wjj = (Xi, Vj) for i = 1, . . . n, j = 1, . . . the 
parameter vector t? defined by dj = (9, Vj), j = 1, . . . and the size n noise vector e 
defined by = ej + (Xi, 0) — [Wi?]j. The functional linear model is equivalently written 
as 

Y = Wi5 + e . 

Intuitively, testing = 0" is a reasonable proxy for testing i? against -ffi ; fc. For this 
reason, we propose a Fisher-type statistic. 

Definition 3.1. In the sequel, 11^ stands for the orthogonal projection in R n onto the 
space generated by the k columns of W. For any k < n/2, we consider the statistic 
0fe(Y,X) defined by 

<MY,X):= J |IIfcY|1 " , . (5) 

||Y-n fe Y||2/(n-fc^) 

The main difference with a classical Fisher statistic comes from the fact that the projec- 
tion Tlk is random. This projector is built using the k KL first directions (Vi, V 2 , ■ ■ ■ , V~ kKL ) 
of the empirical Karhunen-Loeve expansion of X. Let us call Tl^ the orthogonal projector 
in M. n onto the space spanned by ((Xi, Vj))i—i j = 1, . . . , k. If we knew the basis (Vj), 
j > 1 in advance, we would use this orthogonal projector instead of life. We shall prove 
that, under H , <^fc(Y, X.)/k KL behaves like a Fisher distribution with (k KL ,n — k KL ) 
degrees of freedom. 

Definition 3.2 (Parametric tests). Fix a 6 (0, 1) We reject Ho against when the 
statistic 

T aM := ^(Y,X) - k KL T7 k l L n _ iKL (a) . (6) 

is positive. 

Remark 3.1 (Other interpretations of 0k(Y,X)). Consider 9^ the least-squares estima- 
tor of 8 in the space generated by Vj, j = 1, . . . , k KL . It is proved in Section 9.2 that 
||IIfeY||^ = ||ry 2 6?fc|| 2 . Thus, the numerator of (5) corresponds to some norm of 6^. In- 
tuitively, the larger 9k, the larger the statistic <^>fc(Y, X) is. Furthermore, ||IIfcY|| 2 also 
expresses as the numerator of the statistic D n considered in Cardot et al. [6] (see Section 
9.2 for details). 

Remark 3.2. From the considerations above, we see that the transformed parameter T 1//2 '6 
naturally occurs in the definition of 4>k(Y , X). In fact, hypotheses Hq and remain 
unchanged, if we replace 6 by r 1 / 2 ^ in (4) as soon as T is injective. The crucial role of 
this synthetic parameter is underlined in [32] where the functional linear regression model 
is proved to be asymptotically equivalent to a white noise model with signal r 1 / 2 ^. 
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3.2. Size 

We study the type I error of the parametric tests T a ^ k . On one hand, we control exactly 
the size of the tests when the noise e is normally distributed. On the other hand, we bound 
the size of the tests when the noise is only constrained to admit a fourth moment. 

3.2.1. Gaussian noise 

A.l e follows a Gaussian distribution J\f(0,a 2 ) . 

Proposition 3.3 (Size of T ai k under Gaussian errors). Under Assumption A.l and if 
k < n/2, we have for any n > 2, Vo(T a ,k > 0) = a. 

Observe that this control does not require any assumption on the process X. 



3.2.2. Non-Gaussian noise 



In this part, the noise e is only assumed to admit a fourth order moment, but we perform 
additional assumptions on X and k. 

B.l supE faW) 4 < d and — LJ- < C 2 , 

j>1 " cr 4 

where Ci and Ci are two positive constants. 

B.2 For some 7 > 0, (jAj((log 1+7 j) V 1)) . >1 is decreasing and Kerr = {0} . 



B.3 k < n^Vlog^n) . 

Assumption B.l is classical, since we need to control second order moments for the em- 
pirical covariance operator T n . This comes down to inspecting the behavior of the fourth 
order moments of the f/^'s. The second part of B.2 ensures that the framework is truly 
functional. The first part of B.2 is mild and holds for an X that may have very irregular 
paths (it holds for the Brownian motion for which Xj oc j~ 2 ) and for classical examples 
of eigenvalue sequences: with polynomial decay, exponential decay, or even Laurent se- 
quences such as Xj = j~ 5 ■ log _i/ (j) for 8 > 1 and u > 0. In fact, B.2 is less restrictive 
than assumptions commonly used in the literature [5, 6, 22] since it does not require any 
spacing control between the eigenvalues. 

The restriction B.3 on the dimension of the projection is classical for the analysis of 
statistical procedures based on the Karhunen-Loeve expansion. If we knew the eigenfunc- 
tions Vfe of r in advance, we could consider larger dimensions k. The estimation of the 
eigenfunctions Vk becomes more difficult when k increases. By considering dimensions k 
that satisfy Assumption B.3, we prove in the next theorem that the random projector life 
concentrates well around its mean. It may be noticed that this assumption links k and n 
independently from the eigenvalues hence from any prior knowledge on the data. 

Theorem 3.4 (Size of T a },). Under Assumptions B.l — 3, there exist positive constants 
C(a,7) and C2 such that the following holds. For any n > C<i, we have 

Po [T a , k >0]<a+ ( ^f± . 

log(n) 



Remark 3.3. In the proof of Theorem 3.4, we show that, under H , the distribution of 
0fc(Y,X) is close to a % 2 distribution with k degrees of freedom. The arguments rely on 
perturbation theory for random operators (see Section 10). 



Hilgert et al./ Functional tests 



7 



4. Power and minimaxity of T a ^ 

Intuitively, the larger the signal-to-noise ratio E [(X, 9) 2 ] /a 2 = \\T 1 / 2 9\\ 2 /a 2 is, the easier 
we can reject H Q . For this reason, we study how large HT 1 / 2 ^ 2 has to be, so that the 
test T Qj fc rejects H with probability larger than 1 — /3 for a prescribed positive number 
(3. We provide such type II errors under moment assumption of e. Additional controls of 
the power when e follows a Normal distribution are stated in Appendix A. 

4.1. Power of T a , k 

B.4 su P E[(t/^) 8 ] < C . 

j>l 

Theorem 4.1 (Power under non-Gaussian errors). Let a and /3 be fixed. Under B.l — 4, 
there exist positive constants C(j), C\, C2, and C3 such that the following holds. Assume 
that a > e~^™ 7 [3 > C '(7) / log(n) , and that n > C 3 . Then, V e (T a ^ k > 0) > 1 - /3 for any 
9 satisfying 

> ft|(rv» - rv>f + (Jtv* (|) + i og (i)) . (7) 

Remark 4.1. // we knew that 9 belongs to the space spanned by the k first eigenvectors 
(Vi, . . . , Vfc) and if we knew these k eigenvectors in advance, then we could consider the 
statistic defined by 

~ IITT Yll 2 

(tWX,Y) := 11 *L 11 " JT 1 ,(a) , 

||Y-n fc Y|| 2 k ' n - k{ ' 

where life is the projection in E™ onto the space spanned by ((Xi, Vj))i—i,... n , j = 1, . . . , k. 
The corresponding test is optimal in the minimax sense and rejects Ho with probability 
larger than 1 — (3 when _ 

\\T 1 ' 2 9\\ 2 >C(a,(3)Vka 2 /n . (8) 

See [37] for a proof when X is a Gaussian process and e follows a Gaussian distribution, 
the extension to non Gaussian processes being straightforward. In (7), we recover an 
additional term ||(r 1//2 — r^/ 2 )^) 2 because we do not assume that 9 belongs to the space 
spanned by (Vi,...,Vfe). The statistic </>fc(Y,X) only captures the projection of 9 onto 
span(Vi, . . . , Vfe). In fact, the test T a ^ rejects with large probability when 

l|r fc /2 #|| 2 H|r 1/2 0||Ml(r 1/2 - rf>|| 2 

is large. 

Remark 4.2 (Joint regularity of T and 9). Looking more precisely at the bias term, we 
obtain 

OO 

HCrva-rV 9 )tf|| a = 53 x 3 (9,v 3 ) 2 . 
j=k+i 

Conseguently, the bias term does not only depend on the rate of convergence of the eigen- 
values ofT, it also depends on the behavior of the sequence Xj(9,Vj) 2 . In other words, the 
joint regularity of the covariance operator T and of 9 (in the expansion of (Vj), j > 1) 
plays a role in the bias term. For a fixed 9, the power of T a k is large for a tuning param- 
eter k that achieves a trade-off between the bias term || (T 1 / 2 — rj/ 2 )#|| 2 and a variance 
term \fko~ 2 jn. 
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4-2. Minimax separation distance over an ellipsoid 



In this section, we assess the optimality of the procedure T a ^. To this end, we study the 
optimal power of a level-a test, when 9 is assumed to have a known regularity. 

Definition 4.2 (Ellipsoids). Given a non increasing sequence (ai)i>i and a positive num- 
ber R > 0, we define the ellipsoid £ a {R) by 



£ a {R) :=<9eH 



k=l k 



The ellipsoid £ a (R) contains all the elements 9 e H that have a given regularity in the 
basis (Vfc), k > 1. In other words, it prescribes the rate of convergence of (9, 14) towards 
0. The faster a k goes to zero, the more regular 9 is assumed to be. 

We take some positive numbers a and /3 such that a + /3 < 1. Let us consider a test T 
taking its values in {0, 1}. For any subset CcMx R+, (3 [T;C] denotes the supremum of 
type II errors of the test T for all parameters (9,cr) € C: 



(3[T;C] := sup F e [T - 

(0,a)eC 



01 



The (a, /3)-separation distance of an a-level test T over the ellipsoid £ a (R), noted 
p[T; £ a {R)\ is the minimal number p > such that T rejects H with probability larger 
than 1 -0 for all 9 e £ (iZ) and ct > such that llr 1 / 2 ^ 2 /^ 2 > p 2 . Hence, p[T;£ a (R)] 
corresponds to the minimal distance such that the hypotheses {9 = 0, a > 0} and 
{9 e £ a (R), a > 0, llr 1 / 2 ^!! 2 /^ 2 > p 2 } are well separated by T. 



p[T;£ a (R)]:=mi\p>0, (3 



T;i9e£ a (R),a>0, 



> 



By definition, T has a power larger than 1 — j3 for all € £ a (R) and ct > such that 

||rV20||2/ CT 2>p2 [T) £ a(i?)] . 

Definition 4.3 (Minimax Separation distance). PFe consider 



p*[a;£ a (R)} ~ m£ p[T a ; 8 a (R)] 



(9) 



where the infimum run over all level-a tests. This quantity is called the (a, /3)-minimax 
separation distance over the ellipsoid £ a (R). 

Remark 4.3. The notion of (a, (3) -minimax separation distance is a non asymptotic 
counterpart of the detection boundaries studied in the Gaussian sequence model [17]. Fur- 
thermore, as the variance a 2 is unknown, this definition of the minimax separation dis- 
tance considers the power of the testing procedures for all possible values of a 2 . 

Proposition 4.4 (Minimax lower bound over an ellipsoid). There exists a constant 
C{a,j3) such that the following holds. Let us assume that X is a Gaussian process and 
that e follows a Gaussian distribution. For any ellipsoid £ a (R), we have 



p*[a;£ a (R)] > p R := sup 

k>l 



C(a,(3) 



A (R 2 a 2 \ k ) 



(10) 



In other words, for any test T a of level a, we have 



(3 



T a ;\9e£ a (R),a>0, 



11^/20112 



> 



Pa,R,n 



>/3 • 
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Consequently, the (a,j3) minimax-separation distance over £ a (R) is lower bounded by 
Pa r n- The nex t proposition states the corresponding upper bound. 

Corollary 4.5 (Minimax upper bound). Under B.l, 2, 4, there exists positive constants 
C(7) ; C<i, Cs(a,j), and Ci(a, f3) such that the following holds. Given an ellipsoid £ a (R), 
we define 

K :=inf jfc > 1, a 2 k X k R 2 < ~j . (11) 

Assume that a > , (3 > C {"/) / log(n) , n > C 2 , and k* n < n 1 / 4 / log 4 (n). Then, the 

test T a ^» has a size smaller than a + C^{a, 7)/ log(n) and is minimax over £ a (R): 



P 



T 1 ' 2 e\ 



2 



T a , K ; \6 € £ a (R),o- > 0, ^ > C 4 (a,[3)p 2 a>R>n 



< • (12) 



This corollary is a straightforward consequence of Theorem 4.1. Hence, the test T a ^» is 
minimax over £ a (R), that is, its (a, /3)-separation distance equals (up to a multiplicative 
constant) the (a,/3) minimax separation distance. Interestingly, the upper bound (12) 
does not require the error e to be normally distributed. 

Remark 4.4. As a consequence, the (a, /?) -minimax separation distance over £ a (R) is of 
order 



Pl,R,n ■= S UP 
fc>l 



C(a,f3) ^— )A(R 2 al\ k ) 



It depends on the behavior of the non-increasing sequence (A^a^), where the sequence 
of eigenvalues {\k) prescribes the "regularity" of the process X and the sequence (a/c) 
prescribes the regularity of 9. In order to grasp the quantity p 2 aRn , let us specify some 
examples of sequences A^a^: 

Corollary 4.6. Polynomial decay. If Xka 2 = k~ s with s > 7/2, then the (a,f3)- 
minimax separation is of order R 2 /( 1 + 2s '>n^ 2s ^ 1+2s K This rate is achieved by the test 
T a , k withk-{R 2 n) 2 /^+ 2s \ 

Exponential decay. If A^a^ = e~ sfc with s > 0, then the (a, /?)- separation distance 

of Ta ^ over £ a {R) is of order ^yf^ • This rate is achieved by the test T a ± with k x 
log(n)/s. 

Remark 4.5. The condition s > 7/2 in the polynomial regime arises because of As- 
sumption B.3 (k < n 1 / 4 / log 4 (n) / ). This restriction is related to the difficulty to reliably 
estimate the eigenvalues A^ and eigenf unctions Vk when k is large (see Lemma 9.7 and 
its proof). If the process X and the function 9 are less regular (s < 7/2), our theory only 
allows us to take k — n 1 / 4 / log 4 (n) in T a k which leads to a rate of testing of order (up to 
log terms) n~ s ^ 4 R 2 while the minimax lower bound is of order R 2 ^ 1+2s hi^ 2s ^ 1+2s \ Note 
that similar restrictions also occur in state-of-the-art results for estimation. For instance, 
Condition (3.3) in Hall and Horowitz [22] amounts to s > 3. 

In conclusion, T a ^ achieves the optimal rate of detection when k is suitably chosen. 
However, the choice of k depends on unknown quantities such as the regularity of X or 
the regularity of 9. Taking k too small does not allow to detect non-zero 9 such that 
the bias || (r 1 / 2 — rj. / ' 2 )6'|| 2 in (7) is too large. In contrast, taking k too large leads to a 
large variance term \fk/n in (7). The best k corresponds to the trade-off between the bias 
term and the variance term in (7). In the following, we introduce a procedure that nearly 
achieves this trade-off without requiring any prior knowledge of the regularity of T or 9. 
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5. A multiple testing procedure 
5. 1 . Definition 

In the sequel, JC n stands for a "dyadic" collection of dimensions defined by 

/C )1 = {2°,2 1 ,2 2 ,2 3 ...,fc„} , (13) 

where k n is a power of 2 that will be fixed later. As k cannot be a priori chosen, we 
evaluate the statistic (/>&(Y,X) for all k belonging to a collection JC n . This choice of the 
collection /C„ is discussed in the next section. 

Definition 5.1 (KL-Test). We reject Hq: "9 = 0" when the statistic 



sup 

fce/Cn, fe<Rank(f„) 



Y,X)-^^ n _^{ax; n (X)} (14) 



is positive, where the weight a*;,, (X) is chosen according to one of the procedures Pi and 
Pi explained below. 

Pi: (Bonferroni) a;c„(X) is equal to a/\JC n \. 

P 2 .' Let Z be a standard Gaussian vector of size n. We take a;c„(X) = q-x_ >a , the a-quantile 
of the distribution of the random variable 



luf J~LKL n _tKL 

keK n K > n K 



(Z,X)/k KL (15) 



conditionally to X. 

In the sequel, T« (resp. T^) refers to the statistic T a , defined with Procedure P\ 
(resp. P 2 ). Tq 1 ^ corresponds to a Bonferroni multiple testing procedure. In contrast 
handles better the dependence between the statistics 4>k , by using an ad- hoc quantile qx.,a ■ 
We compare these two tests in Section 5.3. This multiple testing approach has already 
been considered in the non-parametric fixed design regression setting [3]. 

Remark 5.1. [Computation of qx.a] Let Z be a standard Gaussian random vector of 
size n independent o/X. As e is independent o/X, the distribution of (15) conditionally 
to X is the same as the distribution of 



inf F^KL-jtKL 



\n k z\\i/k 



KL 



k&Cn \\\Z-U k Z\\l/(n-k KL )J 

conditionally to X. As a consequence, one can simulate a random variable that follows 
the same distribution as (15) conditionally to X. Hence, the quantile qx..a is easily worked 
out applying a Monte- Carlo approach. 

Remark 5.2. [Choice of k n ] In practice, we advise to take k n — 2L lo S2™J- 1 which lies 
between n/4 and n/2. This choice is supported by practical experiences and results obtained 
in sections 5.2 and Appendix A. Nevertheless, some of the theoretical results will require 
to take a slightly smaller value for k n . 

5.2. Size of the tests 

Proposition 5.2 (Size of Tq 1 -* and under Gaussian errors). Under Assumption A.l 
and if k n < n/2, we have for any n > 2, 

Po(Ti 1] > 0) < a , Po(7f> > 0) - a . 
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(2) 

If the noise e follows a Gaussian distribution, the size of T a ; is exactly a, while the 
size of Ta is smaller than a because of the Bonferroni correction. Let us now control the 
size of Ta assuming that e admit a finite fourth moment. 

B'.3 ~k n <n 1 ' i /\og A {n) . 

The assumption B'.3 is the counterpart of B.3 for a multiple testing procedure. Next, we 
state the counterpart of Theorem 3.4 for T a x \ 

Theorem 5.3 (Size of 2^ ). Under Assumptions B.l, B.2, and B'.3, there exist positive 
constants C(ct, 7) and C2 such that the following holds. For any n > C%, we have 



TP > 



< a 



C(a,7) 
log(n) 



5.3. Comparison of and 

The test T a 2 ^ is always more powerful than Ta as shown in the next proposition. 



Proposition 5.4. For any parameter 9^0, the tests T a and T^' satisfy 



<2) 



rl 2) > 



X > 



> 



(16) 



On one hand, the choice of Procedure Pi is valid even for a non-Gaussian noise and 

(2) 

avoids the computation of the quantile qx.a- On the other hand, the test Ta has a size 
exactly a when the error is Gaussian and is more powerful than the corresponding test 
with Procedure Pi. This comparison is numerically illustrated in Section 7. 



6. Power and adaptation of T^ 

Since Ta is always more powerful than T a , we only consider the power and the minimax 
optimality of T a ■ 

Theorem 6.1 (Power under non-Gaussian errors). Let a and be fixed. Under B.l — 2, 
B'.3, B.4, there exist positive constants C(7), C\, G<i, and C3 such that the following 
holds. Assume that a > e~^\ > £(7)/ log(rc), and that n > C 3 . Then, P^T^ > 0) > 
1 — for any 9 satisfying 

W > & c lll( r- - rv>f + (f g (^) + ,„ g (!*=)) . (1 r, 

Remark 6.1. Comparing Theorems 4-1 and 6.1, we observe that the rejection region T a 
almost contains all the rejection regions of the tests T a ^ for all k 6 K, n . The price to pay 
for this feature is an additional ^f\og log n in the variance term of (17): 




This loglog(n) term corresponds to the quantity log(|/C„|). If we had used a collection of 
the form {1, . . . , k n } instead of IC n the loglog(rt) would have been replaced by a log(n). 
We prove below that this log log n term is in fact unavoidable for an adaptive procedure. 
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As for T a additional controls of the power when e follows a Normal distribution are 
stated in Appendix A. Let us now consider the power of T& over ellipsoids £ a (R). In the 
sequel, [.J stands for the integer part, while log 2 (.) corresponds to the binary logarithm. 

Corollary 6.2 (Power of T a 1] over ellipsoids). Under B.l, B.2, and B.4, there exist 
positive constants C(-y), C\, C2, C 3 {a,P), and Ci{a,(3) such that the following holds. 
Assume that a > e _v/ ™, that j3 > C (j) / \og(n) , and n > Ci- Consider the test T a with 
k n = 2 Liog 2 [» 1/4 /iog 4 (n)]J. Fix any ellipsoid £ a {R). 

We have Pe(T a 1] > 0) > 1 - f3 for any 9 E £ a (R) satisfying 



lave Vg 

i r l/20||2 



>C 3 (a,(3) inf 

fe=l,2,4,.. 



Afc+ia^ +1 i? 2 



a 
n 



\J k log log n + log log ) 



2. Consider fc* as in (11). //loglog(n) < fc* < k n , then fg(T^ > 0) > 1 — /3 for any 
9 £ £ a {R) satisfying 



> C 4 (a,/3) v / loglog npl 



R.n 1 



where p a ,R,n is defined in (10) 

This is a direct consequence of Theorem 4.1. 

Remark 6.2. If we compare Corollary 6.2 with the minimax lower bound of Proposition 
4-4, we observe that the separation distance only matches up to a factor of order ^/log log n. 
As a consequence, T a ^ is almost minimax over all ellipsoids £ a (R) satisfying loglog(n) < 
k n < k n . Next, we prove that this \/log log(n) term loss is unavoidable when the ellipsoid 
£ a (R) is unknown. 

Proposition 6.3 (Minimax lower bounds over a collection of nested ellipsoids). There 
exists a positive constant C(a,/3) such that the following holds. Let us assume that X is 
a Gaussian process, that the noise e follows a Gaussian distribution, and that the rank of 
r is infinite. For any ellipsoid £ a (R) , we set 



~2 

Pa,R,; 



sup 

fe>l 



C(a,P) 



^loglog(fcV3)v^ N 



A (R 2 a 2 k \ k 



For any non increasing sequence (ak)k>i and any test T of level a, we have 

|rV20||2 



/3 



T- |J j#e£ Q (i?),a>0, 

R>0 



>~Pa 



R,n 



>/3 



As a consequence, there is a \f\og\ogn price to pay if we simultaneously consider a 
nested collection of ellipsoids. Such impossibility for perfect adaptation has already been 
observed for the testing problem in the classical nonparametric regression framework [35] . 

Remark 6.3. In order to compare the lower and upper bounds of Proposition 6.3 and 
Corollary 6.2, let us specify the sequence Afca|: 

• Polynomial decay. If Afea| = k~ s , then the (a, /3)- separation distance of Ta ^ over 
£ a (R) is of order 

/ / \ 2s/(l+2s) 

^2/(1+2*) [ yloglog(n) \ 



for s > 7/2. By Proposition 6.3, this rate is optimal for adaptation. 
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• Exponential decay. If XkO, 2 . = e sk , then the (a, (}) -separation distance of T^P over 
£ a {R) is of order 

y/logfn) loglog(n) 
yfsn 

for any s > 0. By Proposition 6.3, this rate is almost optimal for adaptation (up to 
a \/\og log(n) / log log log n term). 

In conclusion, the procedure Ta is adaptive to the unknown regularity of 9, to the 
unknown regularity of the eigenvalues (Afc)fc>i and to the unknown noise variance a 2 . 
Interestingly the minimax rate of testing depends on the decay of the non-increasing 
sequence (A fc a|) fe >i. 



7. Simulations 



7. 1 . Experiments 

Setting. The performances of the procedures Ta and are illustrated for various 
choices of the function 6. In all experiments, the noise e follows a standard Gaussian 
distribution with unit variance, while the process X is a Brownian motion defined on 
[0,1]. The eigenfunctions and eigenvalues of the covariance operator of the Brownian 
motion have been computed in Ash & Gardner [1]: 

A -> = n _ 05)2^2 and ^f(*) = ^ sin {V - °- 5 ) nt ) .* e [o. !] » J = 1,2,... 

In practice X(t) has been simulated using a truncated version of the Karhunen Loeve 
expansion Y]j—i -\/A^J7^^ Vj (t) , where the (rfi )jeN f° rm an i-i-d. sequence of standard 
normal variables. The function X(t) is observed on 1000 evenly spaced points in [0, 1]. 

Testing procedure. For each experiment, we perform the tests T^p (procedure Pi) 
and TcP (procedure P%) with k n = 2 L log 2 X J _ The quantile gx,a involved in P2 is com- 
puted by Monte Carlo simulations. For each experiment, we use 1000 random simulations 
to estimate this quantile. 

Choice of 9. 

1. In the first experiment, we fix 9 = as a way to evaluate the sizes of the testing 
procedures. 

2. In the second experiment, we build directly the function 9 in the KL basis of X. 
The set Qkl is made of all the functions 9b. $ with B > 0, £ > 0, and 

100 

d B4 (t) := £r S -°- 5 ^(i) , (18) 

where £ is a smoothness parameter. Observe that £? stands for the Z 2 norm of the 
function 9b,£- As shown on Figure 1, the smoothness of &b,£ & Qkl increases with 
£. For this experiment, we have an explicit expression of the joint regularity of 9 
and T : 

B 2 100 



J=k+1 



In practice, we fix £ = 0.1, 0.5, 1 and B = 0.1, 0.5, 1. 
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Figure 1. Three functions 8 in Okl when B = 1. 



3. In the third experiment, we consider the set @g of functions 



^B, T (t) = Bexp 



(t-0.5) 2 
2r 2 



exp 



1-1/2 



with £? > and r > 0. Here, i? stands for the 1% norm of Ob.t and r is a smoothness 
parameter. In fact, #B,r(i) corresponds (up to a constant) to the density of a normal 
variable with mean 0.5 and variance r 2 . As r decreases to 0, Ob,t converges to a Dirac 
function centered on 0.5. In practice, we fix r = 0.01, 0.02, 0.05 and B = 0.5, 1, 2. 



Number of experiments. We have set n = 100 and n = 500. For each set of parameters 
(n, -B, £) or (n, B, r), 10 000 trials were run to estimate the percentages of rejection of Ho 
(ic. the percentages of positive values of Ta and T^ 2 ' with a — 5%), along with their 
95% confidence intervals. 



7.2. Results 

The two procedures Pi and P2 have been implemented in R [33] on a 3 GHz Intel Xeon 
processor, with a 4000KB cache size and 8GB total physical memory. 

Table 1 

First simulation study: Null hypothesis is true. Percentages of rejection of Hq and 95% confidence 

intervals 





n = 100 


n = 500 


T (l) 


3.47 (± 0.36) 


2.61 (± 0.31) 


T (2) 
1 a 


4.97 (± 0.43) 


5.26 (± 0.44) 



First setting. The percentages of rejection of Ta and T a under Hq with n = 100 

and n = 500 are provided in Table 1. As expected, the size of Ta decreases when n 

(2) 

increases because we pay a price for the Bonferroni correction. The size of T a remains 
close to the nominal level a = 5%. 
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Second setting. Tables 2 and 3 depict the results for 6 G Okl with n = 100 and 
n = 500 respectively. As expected, the power of the procedures is increasing with B as 
becomes larger. Furthermore, the power also increases with £. This corroborates the 
rates stated in Section 6, since the function 0b^ becomes smoother when £ increases. In 
every setting the test with the second procedure performs better than T^P . 

Third setting. The results of the last experiment are provided in Tables 4 for n = 100 
and 5 for n = 500. Again, the power is increasing with B, n and r. Here, r does not directly 
correspond to the rate of convergence of the sequence (J Q 8B. T Vj(t)dt), j > 1 as £ does in 
the last example. Nevertheless, it is difficult to detect a function 8b. t when r decreases, 
that is when 8b, t becomes close to a Dirac function. 

In each setting, the test under P 2 is more powerful than the test under Pi. Nevertheless, 
the procedure P2 is slightly slower to compute as it requires the evaluations of the quantile 
9x,q by a Monte-Carlo method. Under Pi, the mean computation time is 9 seconds for 
n = 100 and 12 seconds for n = 500. In contrast, it respectively equals 11 and 18 seconds 
under P-2,. 

Table 2 

Second simulation study: 8 S @kl> n = 100- Percentages of rejection of Hq and 95% confidence 

intervals 





B = 0.1 


B = 0.5 


B = 1 


5 = 0.1 


T (i) 

■L a. 


3.88 (± 0.38) 


21.41 (± 0.8) 


77.24 (± 0.82) 


T (2) 


5.8 (± 0.46) 


26.38 (± 0.86) 


81.78 (± 0.76) 


5 = 0.5 


T (l) 


4.74 (± 0.42) 


46.47 (± 0.98) 


98.68 (± 0.22) 


T (2) 

J OL 


6.65 (± 0.49) 


52.79 (± 0.98) 


99.06 (± 0.19) 


5 = 1 


T (l) 

1 OL 


4.8 (± 0.42) 


62.67 (± 0.95) 


99.75 (± 0.1) 


T (2) 

■L OL 


7.07 (± 0.5) 


68.3 (± (0.91) 


99.84 (± 0.08) 



Table 3 

Second simulation study: 8 £ @kl, n = 500. Percentages of rejection of Hq and 95% confidence 

intervals 





B = 0.1 


B = 0.5 


B = 1 


5 = 0.1 


-* OL 


5.17 (± 0.43) 


86.98 (± 0.66) 


100 (± 0) 


T (2) 

-* OL 


8.48 (± 0.55) 


90.89 (± 0.56) 


100 (± 0) 


5 = 0.5 


T (l) 

-* OL 


8.81 (± 0.56) 


99.85 (± 0.08) 


100 (± 0) 


T (2) 

-* OL 


13.07 (± 0.66) 


99.88 (± 0.07) 


100 (± 0) 


5 = 1 


-* OL 


11.38 (± 0.62) 


99.99 (± 0.02) 


100 (± 0) 


T (2) 

-* OL 


16.13 (± 0.72) 


100 (± 0) 


100 (± 0) 



8. Discussion 

Two multiple testing procedures of the nullity of the slope function 9 have been proposed in 
this paper. They are completely data-driven and benefit from optimal properties assessed 
in a nonasymptotic setting. We address here some extensions of our results. 

Although we focused on the null- hypothesis "7?o : = 0", our approach easily extends 
to linear hypotheses H y: "8 £ V", where V is a given finite dimensional subspace of 
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Table 4 

Third simulation study: 9 S Sq, n = 100. Percentage of rejection of Hq and 95% confidence interval 





B = 0.5 


B = 1 


B = 2 


T = 0.01 


T (i) 


4.94 (± 0.42) 


11.85 (± 0.63) 


46.69 (± 0.98) 


T (2) 


7.25 (± 0.51) 


15.49 (± 0.71) 


53.56 (± 0.98) 


t = 0.02 


T (i) 


7.33 (± 0.51) 


23.09 (± 0.83) 


80.26 (± 0.78) 


T (2) 


10 (± 0.59) 


28.54 (± 0.89) 


84.04 (± 0.72) 


t = 0.05 


T (i) 


13.85 (± 0.68) 


56.51 (± 0.97) 


99.48 (± 0.14) 


T (2) 


18.13 (± 0.76) 


63.09 (± 0.95) 


99.65 (± 0.12) 



Table 5 

Third simulation study: 8 S ©g, n = 500. Percentage of rejection of Hg and 95% confidence interval 





B = 0.5 


B = 1 


B = 2 


T = 0.01 


1 a 


12.41 (± 0.65) 


54.6 (± 0.98) 


99.75 (± 0.1) 


T (2) 


17.99 (± 0.75) 


63.16 (± 0.95) 


99.98 (± 0.07) 


t = 0.02 


rpW 

1 a 


26.11 (± 0.86) 


88.91 (± 0.62) 


100 (± 0) 


T (2) 


33.95 (± 0.93) 


92.62 (± 0.51) 


100 (± 0) 


t = 0.05 


rpW 

1 a 


65.38 (± 0.93) 


99.95 (± 0.04) 


100 (± 0) 


T (2) 
1 a 


72.74 (± 0.87) 


99.99 (± 0.02) 


100 (± 0) 



% of dimension p < n/2. As previously, the procedure relies on parametric statistics for 
testing H y against H\^y : u 9 e (Vect(Vi, . . . , Vfe)+V)\V", where k is a positive integer. 
We consider the n x k design matrix W defined by Wjj = (X^, Vj) for i = 1, . . . n, 
j = 1, . . . k KL . The space generated by the k KL columns of the matrix W is denoted 
Wf.KT,- Considering a basis (£i, . . . ,£ p ) of V, we define V p as the space generated by the 
p columns of the matrix whose (ij) th element is < Xj,£j >. In the sequel, Tlk,v stands 
for the orthogonal projection in K™ onto Vp D Y^j.ki. of dimension less or equal to k KL , 
while Ily stands for the orthogonal projection onto V p . Then, we consider the following 
parametric statistic: 

fc,v(Y,x):= . . (19) 

|| Y - n fejV Y - n v Y||2/[n - dim(V p + W kKL )] 

Under H y, 4> k y(Y ', ¥L)/k KL behaves like a Fisher distribution with (dim(V p r\W^ KL ), n— 
dim(V p + Wf.KL)) degrees of freedom. The proof is the same as that for </>fc(Y, X). In typ- 
ical situations, we have dim(Vp n W^ KL ) = k and dim(V p + Wy<T.) = k + p. We reject 
H ov when the statistic 



i a y := sup 

keK n , fc<Rank(f„) 



6 fe ,v(Y,X) - k KL F d .i nffi )in _ d +m _ KL) { aiCn (X.)} 



is positive, where the weight ayc„(X) is chosen according to procedure Pi (Bonfcrroni) 
or a slight variation of P 2 (Monte-Carlo). All the results stated for and are still 
valid with T a y. The extension to afiinc subspaces V is also possible. 

The power of T^ 1 ' has been analysed over the collection of ellipsoids £ a {R)- The consid- 
ered ellipsoids describing the nonparametric alternatives are determined by the principal 
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directions (Vj)j>%, which are generally unknown. In fact, for some functions that are well 
represented by a prescribed basis (as wavelet, spline or Fourier basis) and whose expansion 
in the eigenfunction basis decreases slowly, projecting the data onto the Karhunen-Loeve 
expansion is not necessarily best suited. Alternatively, one can adopt a similar approach 
in the context of a prescribed basis (as wavelet, spline or Fourier basis) instead of the 
eigenfunctions basis discussed above. The size and the power of the corresponding proce- 
dures are in fact easier to derive than for a Karhunen-Loeve approach as we do not have 
to control the randomness of the basis. We refer for instance to [3] for such results in a 
fixed design regression problem. As is unknown, the best choice of basis (prescribed or 
estimated by PCA) is also unknown. A solution is to combine testing procedures based 
on different basis. 



9. Main proofs 

In this section, we emphasize the core of the proofs. Arguments based on perturbation 
theory are introduced in the next section. All the technical and side results are postponed 
to Appendix B-E. 



9.1. Additional notations 

Given any integer k < Rank(r), we recall that Tk = ^iVj ® Vj, where <g) stands for 

the tensor product. Similarly, T n k := . —1 \Vj ® Vj denotes its empirical counterpart. 
For any k < Rank(r), we note Hk the orthogonal projection in % onto the space spanned 
by Vj, j = 1, . . . , k, while life stands for the orthogonal projection onto the space spanned 
by Vj, j = i, ■ . ■ , k A Rank(r„). 

In order to translate the definition of the testing procedure into functional data anal- 
ysis framework, we shall use A = E((X, .)Y). We note A n = $Z i=1 (-X,-, .)Y,-/n its em- 
pirical counterpart. For any k < Rank(r), we note A k = Ylj=i an d A k = 
^fcARank(r„) ^-1/2^^ em pj r i ca i counterpart. 

Let S be a bounded linear operator on the Hilbert space 1-L. The corresponding operator 
norm will be denoted W'W^ where ||S'|| 0o = sup xgB / \\S (x)|| and B (0, 1) stands for the 
unit ball of H. Let T be a Hilbert-Schmidt operator. denotes the Hilbert-Schmidt 

norm and tr stands for the classical trace (defined for trace-class operators). We recall 
that ||T|| 2 HS =tr (T*T). 

In the sequel, we note \k (u) the probability that a x 2 variable with k degrees of freedom 
is larger than it, while Xfe 1 ( u ) denotes the 1 — u quantile of a \ 2 random variable. 



9.2. Connection between </>fc(Y,X) and the procedure of Cardot et al. [6] 

In fact, the numerator of the statistic cf>k is exactly the same as the test statistic || y^nAkAnW 2 
introduced by Cardot et al. [6], that is: 

M Y, X) = J» fi * Y "" s = 1^ A "" 2 . . (20) 

||Y-n fe Y||2/( n -fc^) || Y — IIfcY||2/ (n — k KL ) 

Proof of Equation (20). Consider the least-squares Ok estimator of in the space gener- 
ated by %, j = 1,.. .,k KL . It follows that ||f[fcY|^ = n@ k ,f n 6 k ). Since k = f ? ; fe A„ 
where T~ k is the Moore-Penrose pseudo-inverse of T rit k, we obtain 

||n fe Y||2 = n(f- fe A n ,f„f- fc A„) = n{A k A n ,A k f n f- k A n ) = n\\A k A n \\ 2 . 



Hilgert et al./ Functional tests 



18 



□ 



9.3. Proof of the type I error bounds 



We first prove Propositions 3.3 and 5.2. Afterwards, we derive Theorem 5.3. Finally, we 
explain how to adapt the arguments for Theorem 3.4. 

Proof of Propositions 3.3 and 5.2. Let us assume that e follows a Gaussian distribution 
and that 6 = 0. Conditionally on X, the statistic 0fc(Y, X)/fe defined in (5.1) follows a 
Fisher distribution with (k , n — k) degrees of freedom. Hence, conditionally on X, the test 
T a ,k has a size exactly a. Conditionally on X, T^' is a Bonferroni procedure of Fisher 
statistics and its size is smaller than a. Reintegrating with respect to X, we derive that 



the size of T ( 
satisfies 



(i) 



is smaller than a. Let us turn to the second result. The quantity gx,. 

i f (n _fc } ||n fee ||2 



sup 

fee/c„ 



fc||e-n fc e| 



^:_ fc («x,«) >o 



X 



which implies that P (TA } |X) = a X a.s. 
Proof of Theorem 5.3. First, we state that k - 
Lemma 9.1. Consider the event A n defined by 



k with large probability. 



An 



sup 



A 7" A 7 



i<i<fc„ min{Aj 



Xj + i,Xj. 



A;} 



> 1/2 



□ 



(21) 



Under Assumptions B.2 and B.3, we have 



P(A0 < c( 7 ) 



kl\og 2 (k n Ve) 



<C{i) 



log 2 (n) 



where 7 is a positive constant involved in Assumption B.2. 



(22) 



This result, proved in Appendix D, relies on the perturbation theory of random oper- 
ators. Observe that under the event Am we have k = k for all k < k n . Consequently, 
we can replace k by k in the definition of the test statistic up to an event of proba- 
bility less than C(7)/log(rt). In the sequel, we use the alternative expression (20) of 
and we replace k by k. The proof is split into three main lemmas 9.2 - 9.4. The first 
lemma, states that H-ynA/jArjp/cr 2 behaves like a x 2 distribution. Its proof (Appendix 
C) relies on a multivariate Berry-Esseen theorem. The second lemma, which tells us that 
H-y/nAfcAyjlp/cr 2 is close to H-y/nAfcAnlp/cr 2 is proved below. The third lemma, proved in 
Appendix E, states that ||Y — n/jY|| 2 /n concentrates well around a 2 . 

Lemma 9.2. Assume that B.l and B'.3 hold. For any k > 1 and any x > 0, we have 



•(\\VnA k A n \\ 2 >x) -xk(x/a 2 )\ < C 



k 3 / 2 E [e 4 ] 



3/4 



sup E (r/ U) ) 4 
i<j<k 



3/4 



< 



c 



log 2 W ' 



uniformly over all k < k n 
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Lemma 9.3. Assume that B.l-B'.3 hold. Writing x nk = l/(fclog 2 (n)), we have for all 
k < k n , and all n > 5, 



\\V^A k A n \\ 2 > (l-aj^^-^IVnAfeAnll 2 ] < P [A n ] + 

Lemma 9.4. Uniformly over all k < k n , we have 

k log 2 (n) / log log n 



\og 2 {n) ' V lo gW 



(23) 



1 



> 



< 



log^(n) V" 

Let us upper bound the rejection probability due to the statistic <j) k 

fhA k A n \\ 2 



(Y,X)>kT^_ k (a/\IC n \) 

by the three following probabilities 
\WnA k A,, ' 



< 



||Y-n fc Y||2/ T 



>k?k,n-k («/l^n|) 



(1 - Xn,k)<J 2 



> k 1 - 



loglog(n) fclog 2 (n) 



•^fci-fe ("/l^r. 



+P 



/nA fc A„|| 2 > (1 - ir n , fe ) ^ y/nA k A n f 

k\og 2 (n) /log log (n) 



1 



> 



Gathering the above results, we obtain that this probability is upper bounded by 

k 



c(rr) + f ^ 



log 2 (?i) 1 V lo s( n ) 

uniformly over all k < k 



+ Xk 



k 1 



log log n fclog (n) 



X n ,k F k ,n-k («/|fcnl) 



(24) 



Lemma 9.5. Writing t = 8* ™ _i_ ("> 

numerical constant 



Xk 



< 



+ fc jog 1 ^ (-„■) , we /icrae /or n larger than some 
C{a) 



\JC n 



log(n) 



The proof of this technical lemma is postponed to Appendix E. We conclude by com- 
bining (24) with Lemma 9.5 and taking an union bound over all k € IC n (recall that 
|/C n | < log(n)). □ 

Proof of Theorem 3.4- Define t = 8-\/log log n/n + k\og 2 (n)/n + l/fclog 2 (n). Gathering 
Lemmas 9.2, 9.3, and 9.4 as in the proof of Theorem 5.3 and relying an Condition B.3, 
we derive an upper bound analogous to (24) 



c (Y,X)>fcJ--i_ fe (a) 



< 



Xk 



k{l~t)^l_ k (a) 



Chi , 
log(n) 



for n large enough. Applying the following inequality (proved in Appendix E) allows us 
to conclude. 
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Lemma 9.6. For n larger than some numerical constant, we have 

C{a) 



Xk 



k(l-t)^_ k (a 



< a 1 



log(n) 



□ 



Proof of Lemma 9.3. From ||6|| 2 — ||a|| 2 = 2 (a, b — a) + \\b — a\\ 2 , we get 

2 VII 2 < \\b-a\\ ( 2 | \\b-a\\ 



i«ir 



Since x n .k < 1 for n > 3, it follows that 



2 > HVnAfcAnlP 



1 *^n, t 



< 



ri ( A fc - Ak ) A„ 



< 



>/n (Ah -A k ^j A., 



> 



41og(n) 



\\y/nA k A n \ 



Iv^^fcAnll < 



> 



log(n) 



By Lemma 11.1 in [36], we know that for any < x < 1 and any integer d > 1, 
P [x 2 (^) < cfe -1 x 2 / d ] < £. We get from Lemma 9.2 and the last deviation inequality 
that 



|V«^4/cAJ| < 



ayk 
log(n) 



< 



C 



log 2 (n) ' V lo g(«) 



uniformly over all k < k n . Let us turn to the other term. By Markov inequality and by 
definition of x n>k , the first probability P[||v / n(Afe — Afc)A n || > ^^ff ] is smaller than 



16fclog 6 (n) 



E 



In order to conclude, we only need to bound E[||y / n(yl / r c — j4fe)A„|| 2 l^ ]. If we prove 

~ll\o£(n) k, 



E 



then we get 



A k - A k \) A„|| 2 l^r 



<C(7) 



(25) 



Au - A, A 



> 



a\fk. 



41og(n) 



log (n) 



by Assumption B'.3. Thus, it only remains to prove (25). 

Noticing that A k — Ak only depends on the X;'s, we derive that 



E 



[l k - A fc ) A„ 



n 

= — E 
n 

= — E 
n 



A k - Ak) X x e x 



A k X x 





2 




h n 



+ \\A k X 1 \\< -2(A k X 1 ,AkX 1 ) 1^ 
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We deal with each term separately: 



E 



E 



1^*1 II 2 l: 



l^ill 1^ 



= E 

= E 

= E 

= E 



tr(A k {X 1 ®X 1 )A k j 1^ 
<k¥[A n ] , 



= E 



tr (A k r n A k 1 



trn fc i Xl 



tr ( A k T n A k l Xi 



[tr (A k TA k ) 1 X J + E [tr [A k (f „ - r) A k } 1- 



E 



A^AkX^l 



= E[trna x J- E 

< fcP [A 
= E ' 



tr A,, r» - r Mi, i 



k 1-Ar, 



E 



tr 2 ( A k ( r n - r ) A fc 



= E 



tr (A fe r n A fe j 1^ 

k¥(A n ) + E [tr {r- 1/2 (f $ - if 2 ) 1^ } 



It follows that 



E 



< 2cr 2 E 



tr{r; 1/2 (rf -fv»)} 



tr' 



(A k (f„ - r) A fe )] x/lPK] . (26) 



Lemma 9.7. Under Assumptions B.l and B.2, me Ziai/e /or a/Z n > 1, 

i fc 3 pog 2 (fc)Vl] , c , k 



E 



tr{r^ /2 (rf -f;/ 2 )}i X; ]<c (7 ) 



(27) 



uniformly over all k < k n 



Lemma 9.7 is the core argument to control the behavior of the statistic. Its proof relies 
on perturbation theory and is postponed to Section 10. Let us compute the last term 



E 



tr 2 (A k (f „ - r) A fe ) 



E 



EE 



Iv 



(i)i2 



fc2 ,r 

< — sup Var 

n j>x 



j=i \i=i 



< c 



k 2 



by Assumption B.l. Combining this bound with (22), we get 

k„ /2 log(fc„ V e) 



E 



tr 2 [A k ( r n - r ) A k 



n/P[AJ<C( 7 )- 



Gathering Lemma 9.7 with (26), and (28) allows us to prove (25). 



9-4- Proofs of the type II error bounds 



(28) 

□ 



Proof of Proposition 5.4- This proof follows the same steps as the proof of Proposition 
3.2 in [37]. □ 
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Wc first derive Theorem 6.1 and then explain how to adapt the arguments for Theorem 
4.1. 

Proof of Theorem 6.1. Arguing as in the beginning of the proof of Theorem 5.3, we can 
replace k by k in the definition of the statistic (14). Consider some k g JC n and take 
n > 8, the numerator of 0fe(Y,X) (20) is lower bounded as follows 

\\V^A k A n \\ 2 > ||^L4 fc A„|| 2 [l - (Vfclog(n))- 1 ] - Vk log(n)|| ^i{A k - 2 fc )A n || 2 , 

since lab < a 2 + b 2 . Observe that A n = T n 9 + A n ,i, where A n> i = J^-xPQ, ■)^i/n. The 
proof is based on the two main following lemmas. 

Lemma 9.8. For any f3 6 (0, 1), we Ziawe 



\y/nA k A n \\ > fed 2 



-.1/2/1112 



- 2(T 2 Wfclog 



- 10cr 2 log 



o V w V' 3 

with probability larger than 1 — /3/2 — C/log(n) uniformly over all k < k„ 
Lemma 9.9. Assume that B.l-B'.3 hold. For any n > 1 we have 

C(7) 



/n(A fc - A fe )A nj i|| > 



log(n) 



< 



< 



log(n) 

C(7) 
log(n) 



uniformly over all k < k n 



Lemma 9.8 is based on a multivariate Berry-Esseen inequality and is proved in Ap- 
pendix C. The second lemma proceeds from the same kind of arguments as Lemma 9.3. 
Thus, its proof is postponed to Appendix E. We get by gathering Lemmas 9.8 and 9.9 
and since vfclog(n) > 2 for n > 8, 



\\Vn~A k A n \\ 2 > ka 2 - 3a< 



log(n) 



2n- 



ir 1 / 2 ^! 2 

log(n) 



-CmWT^OW 2 ~C 2 a 2 [J* log 



/3 



log 







with probability larger than 1 — /3/2 — C(7)/ log(n). Next, we use a rough control of the 
denominator, proved in Section E. 



Lemma 9.10 (Control of the denominator). We ho 



n fe Y| 



n — k 



1 + C 



log(ra) 



C'\\T l t 2 e\\ 2 /(5 , 



with probability larger than 1 — 2/log(n) — /3/4. 

Since i/log(2//3) > l/log(n) and C\ > 2/ log(n) for n large enough, we derive from the 
previous results that with probability larger than 1 — 3/3/4 — C(7)/log(n), the statistic 
0fc(Y, X) is lower bounded by 

ko 2 + 2 0|| 2 nC( - C 2 a 2 (Vfclog(l//3) + log(l//3)) - 2^11(1^2 _ rf 2 )0|| 2 



1 + C 



log(") 



c'\\T l / 2 e\\ 2 /p 



(29) 
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By Lemma 1 in [3] , we can upper bound the quantile of Fisher distribution 



k^ n _ k {a/\K n \)<k + C 



' k log 



|/C„ 



log 



a 



(30) 



since we assume that log(|/C„|/a) < log(n)+log(l/a) < 2y/n. Comparing the lower bound 
(29) with (30) allows us to conclude. We refer to Appendix E for the details. □ 

Proof of Theorem 4-1- We have shown in the last proof that </>fc(Y, X) is lower bounded 
by (29) with probability larger than 1 — 3/3/4 — C(<y)/ log(n). By Lemma 1 in [3], we upper 
bound the quantile of Fisher distribution 

k ^k,n-k( a ) <k + C L/fclog(l/a) + log (1/a) 



since log(l/a) < yfn. Comparing these two bounds leads us to the desired result. 



□ 



10. Arguments based on perturbation theory 
10.1. Preliminary facts 

Roughly speaking, several results mentioned below are based on an extension of the classi- 
cal residue formula on the complex plane (see Rudin [34] ) to analytic functions still defined 
on the complex plane but with values in the space of operators. We refer to Dunford and 
Schwartz [18, Chapter VII. 3] or to Gohberg et al. [19, 20] for an introduction to functional 
calculus for operators related with Riesz integrals. Let us denote Bj the oriented circle of 
the complex plane with center Xj and radius Sj/2 where Sj is defined by 

Sj = min {Xj - A,. : . A, , - Xj} . (31) 

The open domain whose boundary is '■— U^ =1 Bj is not connected but we can apply the 
functional calculus for bounded operators (see Dunford and Schwartz [18, Section VII.3, 
Definitions 8 and 9] ). Using this formalism it is easy to prove the following formulas : 

n fe = — / (zi-ry^z and rl /2 = — [ z 1/2 <zi -rr 1 dz. 

The same is true with the random operator r„, but the contour Ck must be replaced 
by its random counterpart Ck = y^^ ank ( r ") where each Bj is a random ball of the 

complex plane with center Xj and a radius Sj/2 = min{Aj — Xj+%, Xj-\ — Xj}. We start 
with some lemmas. 

Lemma 10.1. Assume that for some 7 > 0, the sequence (jXj log 1+7 (j V 2)) de- 
creases. Then, we have 

E iX^^y <^)^[lo g A:Vl] . 
For any positive integer j, let us define the event 



Sj n := < sup 
I zeBj 



(zI-T)- 1/2 (f n -r)(zI-Ty 1/2 >l/2 
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Lemma 10.2. Suppose that Assumption B.l — 2 holds. For any j > 1, We have the two 
following bounds 



E sup 



(z/-rr 1/2 (r n -r)(^-r 



\-V2 



'(£,>)< ^ b'(iogjvi)] 2 



The proof of Lemma 10.1 (resp. 10.2) is postponed to Appendix E (D). 



10.2. Proof of Lemma 9.7 

In order to upper bound this expectation, we set Xj = for any j > Rank(T n ). We have 



tr 



p-l/2 /Va/a _ pl/2\ i 

1 ft I A fc 1 n,*;^ x ^ 



i=i l=i v 



< kl A n - E 

j=l 

^Ef 1 -^^) W+E 



j=i 

k 



fc 



where the last equation follows from the upper bound \y/l + x — 1| < |x| for any a: > — 1. 
Observe that under the event A n , (Aj — < 1/2. Applying Lemma 10.2, we obtain 

the following bound 



E 



tr[r-/ 2 (rf -f#)i 



<E E 

fe 

<E E 



k 

E E 

fe 

E E 

i=i 



|A,--A,-| 



lAi-Ail. 



i=i 



C(7) 



fc 3 (log 2 (/c)Vl) 



(32) 



In the sequel, 7Tj stands for the orthogonal projector associated to the single j — th eigen- 
vector Vj while TTj refers to its empirical counterpart. Applying functional calculus tools 
for linear operators, we get for any 1 < j < k 

1 - (v^Vj) 2 = (V^vrf - (V 3 .,%) 2 = ((nj - %) Vj,Vj) 



1 

2vr7 



dz 



which looks like the definition of EL- given in the first paragraph of Section 10.1 (note 
that only the contour changed). Under the event A n , Xj lies inside the circle Bj. In fact, 
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(zl — r„) 1 has only one pole inside the circle Bj at z = Xj. As a consequence, we have 
almost surely 



/«>'- p ")"' 



V,,Vi)dzl^ 



A„ 



zI-T n ) Vi,Vi)dzl^ 



An ' 



^i. ) dzl e } , n nA n ■ 
(33) 



so that 

T/. T/A 1 ' _ - 

L £j , n nA n 2ni 
Working out this integral, we get 

= - / ((*/ - rr 1 (f„ - r) (zi - ry 1 v h Vj 

~ L ( ( z/ ~ 1 ^ n ~ r ) (z/ ~ r)1 ^ n ~ r ) (z/ ~ rrl Vj,Vj ) dz ■ 

The first term is j B .( z — ^j)~ 2 ((^n — Vj)dz. Thus, it is null almost surely by 

the Cauchy integration theorem. Define S n (z) = (zl — T) 1 ^ 2 (zl — r„) _1 {zl — T) 1 ^ 2 
and T n (z) = (zl - T)~ 1/2 (?„ - T) (zJ - r)~ 1/2 . For any fixed z, we have S n (z) = 
[I — T n (z)] 1 . Thus, it comes from (33) that 



E 



•-- E 



L £j, n nA„ 



2vrt 



L jf ((z/ - r)- 1/2 5 n (z) t 2 (z) (z/ - r)- 1/2 vs-, vg-) %. 
{ll^(2)ILi^J|T„(*C (^-r)" 1 } 

(log 2 (j) VI) 



sup 



< CSj sup 


(zi-r)- 1 


E 


sup \\T n (z)^ 


< 






oo 







< c( 7 y- 



(34) 



since sup z€B] 



(zI-T) 1 <25- 1 ,sup 2eB J|5 n (z)|| co % <2andE[sup 2ee J|T„(z)|| 2 <) ]< 

^-^j 2 (log 2 (j) V 1) by Lemma 10.2. Hence, we obtain an upper bound for the first term 
in (32) 



E 



3 = 1 



<C(7) 



fc 3 (log 2 (fc) V 1) 



(35) 



Turning to the second term in (32), we only provide a sketch of the proof since the 
approach is the same as the first term in (32). We have 



Xj - Xj = tr ( T n TTj - T-Kj ) = tr ( T n (nj - 7Tj) J + tr ( ( T n - T ) itj y . 

so that 



tr 



(f n 0r 3 ;-7Tj)) 



(( 



tr r„ - r Tr 
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The second term in this decomposition is bounded as follows 



E 



tr ( ( r n - r ) 



< E 



v n -r)v j ,v j 



< 



(37) 



We turn to E 



bounding (1 - (Vj,Vj) 2 ). 



tr (r„ (9 3 - ttj) 



L A n ns jtn 



and we use the same method as above for 



-tr 



L A n n£j 



\j2~KL 



-tr 



T n (zi-Y n \ r n -r ){zI-T) L dz 



-4„n£,- 



Aj2lTL 



-tr 



\j2~KL 



-tr 



jf (f n 1 -r(zi-r)- 1 ^ (f„-r) (zi-vy'dz 

I z(*i-f„) 1 (f„-r)(z/-r)- 1 (r„-r)(z/-r)- 1 dz 

J Da 



L A n n£ ]/ 



\j2lIL 



tr 



z (z/ - ry 1/2 s n (z) ti ( z ) {zi - ry 1/z dz 



1/2. 



L A n n£ 3irl ■ 



From the upper bound 



tr 



< \\(zi - ryWsMiHsWTnizi - r)- 1 / 2 ^ < \\{zi - r)- 1 || 00 ||s n || 00 ||T n | 

we derive as in the proof of (35) 
1 



2 

HS > 



A, 



-E 



tr [T n (nj -TTj)^ 

[ \z\ (zI-T)- 1 \\S n (z)\\ 00 \\T n (z)f HS dzl I 



< C*E 



sup ||T„ {z)\\ HS l£ 

zeBj 



<c(rf) 



j 2 (log 2 j VI) 



Gathering (36) and (37) with this last bound, we get 



k 



< fe 3 (lQg2(fc)vl)+g/ ,^ 



Combining this last bound with (32) and (35) allows us to conclude. 
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Appendix A: Power under Gaussian Noise 
A.l. Power of T a>k 

Proposition A.l (Power under Gaussian errors). There exists positive constants C, 
C\{f3), and Ci such that the following holds. Suppose that a > exp(— n/20), /3 > C/n and 
that Assumptions B.l and A.l are true. Then, Pg(T a k > 0) > 1 — /3 for any 9 satisfying 



a, 



c 2 



j>k+i 



Ik log 



log rt 

a/3 



loe 



loe 



a/3 



(A.l) 



Remark A.l. // this result requires very weak assumptions on the process X (only a 
fourth moment assumption), the bound (A.l) is slightly looser than (7) in Theorem 4.1 
because 

ll (r V2_rf)tf|| 2 <A fc+1 ||0|| 2 . 



A. 2. Power of 



A similar result holds for T« . 

Proposition A. 2. There exists positive constants C, Ci(/3), and C 2 such that the follow- 
ing holds. Suppose that a > exp(— n/20), /3 > C/n and that Assumptions B.l and A.l 
are true. Then, Vg{T^ > 0) > 1 — j3 for any 9 satisfying 



ir^f > inf C^P) A fc+1 + £ 

j>k+i 




'fclog 



logn 

a/3 



log 



logn 



A . 3. Proofs of Propositions A . 1 and A . 2 

We first prove Proposition A. 2 and then adapt the arguments to Proposition A.l. 

Proof of Proposition A. 2. Let us first work conditionally to X. In this case, the design X 
and the projection life are considered as fixed. Thus, the statistic Ta is analogous to the 
procedure of Baraud et al. [3]. By Theorem 1 in [3], we have P g (Ti 1] > 0) > 1 - /3/2 if 
satisfies n(9,T n 9) > infj. e x:„ As, X) where A(0,fc, X) is defined by 



C l (9M KL f n 9) + C 2 Jk KL log 



2\ogn\ _ 2 / 21ogra\ 2 



a£ 



O- 2 + C* 3 fog 



a/3 



(A.2) 



since a > exp(-n/20), /3 > C/n. We have n(9,? n 9) = Yn=i{ x h 6 ) 2 - % Assumption 
B.l, we get 



E[(A,( 



E 



£VW> 



< C((9,r6>) 4 



Applying Chebychev's inequality, we have 



l,T n 6) > E[(A,6») 2 ]/2, with probability larger than 1-/3/4 as long as /3 > C/n . (A.3) 
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Let us fix some k g /C„. We have (0, U± KL f n 6) 2 < \\9\\ 2 X kK L +v Observe that k KL + l < 
k + 1 only if Xf, KL+l — 0. Consequently, we also have (9,H^ KL T n 9) < \\9\\ 2 Xk+i- 

To conclude it is sufficient to provide an upper bound of Xk+i with high probability. 
By definition of Afc+i, we have 



Xk+i = inf sup ( 

W, Codim(W)=k zeM /i i |U|| = 1 



z,T n z) < 



sup 

*eVect(V fc+1 ,...), ||z|| = l 



implying that 

A fc+ i < ||n^f XIU < + lin^(r - f^n^u < A fc+1 + ||n£(r - f n )n^|| ffS . 

Hence, it is sufficient to bound the Hilbert Schmidt norm ||II^(r — r n )ILr||.H- 1 g in probabil- 
ity. By Jensen's inequality, we have E[||n^(T - r n )n^|| HS ] < E[||n^(T - T n )Il^\\ 2 HS ]^ 2 
and simple calculations lead to 



E[||n£(r - r n )n^||^ s ] = -E[||n^rn^ - (nix) ® (ntx)]^] . 

By Assumption B.l, we conclude that 

E[||n^rn^ - (n^x) ® (nix)]f HS ] < e[||i^a|| 4 ] < c( £ a,) 2 . 

i>fe+i 

By Markov inequality, we conclude that Afe + i < A^ + C(f3) Ylj>k+i w ^h probability 
larger than 1 — (3/4. Gathering this probability bound with (A. 2) and (A. 3), we derive 
that P 9 {t£ ] > 0) > 1 - /3 if satisfies for some k e K n , 



| r l/2^|,2 > <h i 

n 



X k + C(/3) ^ 
j>k+i 



-Co 



'/clog 



2 logn 

a/3 



log 



2 log n 

a/3 



□ 



Proof of Proposition A.l. As in the previous proof, we apply Theorem 1 in [3] except that 
there is now only one test (instead of \K n \ tests). We have Wo(T a ,k > 0) > 1 — (3/2 ii 6 
satisfies 



n{6,T n d) > C 1 (9,Uf KL r n e)+C 2] Jk KL \og 
Furthermore, we have shown that 



21ogn 

a(3 



cr 2 + C 3 loe 



2 logn 

a/3 



> E[(AT,0) 2 ]/2 



,n£«r„0) < 



j>k+i 



with probability larger than 1 — (3/2. Gathering these three bounds leads to the desired 
result. □ 
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Appendix B: Proofs of the minimax lower bounds 

Proof of Proposition 4-4- F° r anv dimension k > 1, we define r 2 = C(a,P)^§ A X k a 2 k R 2 7 
where the constant C(a,/3) will be fixed later. For any 9 £ Vect(Vi, . . . , Vfe) such that 
\\T 1 / 2 9\\ 2 /a 2 < r 2 , we have 

t ^ < Ad^a < i£ < «v 
ft a ? Afca ' ft a ' Afc 

since r\ < A^a^i? and since the Ajag's are non increasing. As a consequence, 
{«£ Vect(Vi,...,^), \\T 1 / 2 ef/a 2 =rl}c{6££ a (R), \\T^ 2 d\\ 2 /a 2 > r 2 } . 

Since X is a centered Gaussian process, ((X,Vi), . . . ,{X,Vk)) is a centered Gaussian 
vector. Assuming that 9 belongs to Vect(Vi, . . . , Vfe) and that (Vt, ■ ■ • , Vfe) is known, the 
functional linear model translates as a linear Gaussian model with Gaussian design as 
studied in [37]: 

k 

Y = Y / {X,V j ){B,V j )+e. 

3=1 

By Proposition 4.2 in [37], there exists a constant C(a, (3), such that for any test T of 
level a, we have 



/3 



T-{6£ VectO^, . . . , Vfe), a > 0, Hr 1 / 2 ^ 2 > C(a, /?)— a 2 



>/3 



Gathering this last bound for all k > 1 allows us to conclude. □ 

Proof of Proposition 6. 3. As in the last proof, we shall adapt results for the Gaussian 
linear regression model with Gaussian design. Let fc* (R) £ W be an integer that achieves 
the supremum of f\ = C(a, P)y/k\og \og{k V 3)/n A R 2 a 2 Xk- We note as in the last proof 
that for any R > and k*(R) in N*, 

||pl/20||2 -) f ||pl/2jQ||2 

9 £ Vect(Vi, . . . , Vfe, (fl) ), = r 2 k , (R) c e £ a (J2), > ^ 



Thus, we obtain 



Ul^Vect^,-,^ Var( jl) f ||g||2 

9 £ Vect(7i, . . • , Vfe, (if) ) " " = r 2 , (i?) 



= C(a,/3)VfclogIog(fcV3)/n 



c 



fl>0 

Hence, we only have to provide a minimax lower bound for simultaneously testing over 
a family of nesting linear spaces. Letting p go to infinity in Proposition 5.5 in [37], we 
obtain that 







U {° e Vect ^' • ■ ■ > V *) > Var( ^_ /2 ||ri/ 2 g|| 2 = «Vfcloglog(fcV3)/n 



which allows us to conclude. 



□ 
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Appendix C: Proofs based on Berry-Esseen type inequalities 

Proof of Lemma 9.2. Let us fix some fc > 0. For any 1 < j < k, we have 



1 " 

Vti(A n ,A k Vj) = — V(A, 



n 



For any 1 < ji < j% < fc and 1 < i < j, the random variables W Q and rj^e^ are 
uncorrelated. By the central limit theorem, we conclude that ||v / rL4/ c A„|| 2 /cr 2 converges 
in distribution towards a x 2 (fc) random variable, at least when k is fixed. 

In order to precisely control the tails of H-y/nAfcAnH 2 , the central limit theorem is not 
sufficient. We need a Berry-Esseen type inequality. Let us call Wi the vector of size k 
whose j-th component is 77^ e». We note ||Wi||fc its Euclidean norm. By Assumption B.l, 
we have 

IE [HWilll] <fc 3/2 E[e 4 ] 3/4 sup E[(^)) 4 1 3/4 . 
Applying the second part of Theorem 1.1 in Bentkus [4], we obtain 



sup|P(||\/^4 fc A„|| 2 > x)-xk(x/a 2 )\ < C 



fc 3 / 2 E [e 4 ] 



3/4 



x>0 



sup E 

l<j<k 



(v 



3/4 



We conclude by applying Assumption B'.3. 



□ 



Proof of Lemma 9.8. As explained in the proof of Lemma 9.2, i/n/crAfcA nj i converges to 
a Gaussian process whose covariance operator is defined by = X)j=i(K? > - For 
j = 1, . . . , k, we define & = (A] /2 y / Var([^)] 2 )((9, V,-))" 1 if (9, V ) 2 ^ and = else. 

Consider the operator Du = Sj=i For any j = 1, . . . , k such that £j 7^ 0, we 

have 



Var([r ? 0)] 2 ) v /nVar([? ? W)] 2 ) 



-1/2 

As a consequence, t/n[DkAkT n 8 — D k T k 9) converges in distribution towards a Gaussian 
process whose covariance operator E' fc is defined by S' fe = 2,- =1 (Vjf, .)\^1^^q. Further- 

1/2 

more, the processes y/n/a AkA nt i and ^fn(DkAkF n 9 — DkF j~ 9) are asymptotically inde- 
pendent. Let us consider the random vector Z of size fc' := k + #{ j S {1, . . . , k} : ^ 0} 
such that Zj = e/crqW) if j = 1, . . . , k and Z, = ([r?«] 2 - 1)/ ^Var^O')] 2 ) if j > fc. Let 
us upper bound E[||Z|||,] 

E[||Z|||,] < Ck 3/2 



Ek 4 l :i/4 



max E 

i<j<k 



(v 



3/4 



V max E 
i<i<fc 



( „0))8 



3/4 



We note Zj., . . . ,Z„ the n observations of the vector Z, based on rj 2 and e< for i = 
1, . . . , n. By Assumptions B.l and B.4, we can apply the Berry-Esseen type inequality of 
Bentkus (Theorem 1.1 in [4]) in dimension fc'. For any convex set *4, we obtain 



Y.-T^ A ) -V[Af k >(0,h')eA] 



< c 



fc 7 /4 



E[e 4 ] 3 / 4 



max E 

l<j<k 



3/4 



V max E 

i<j<k 



3/4^ 
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Moreover, this last quantity is smaller than Cn" 1 / 16 log -7 (n) uniformly over all k < k n 
by Assumption B'.3. Consider a standard Gaussian vector (ui, . . . ,U2k)- We define the 
random vector W by 



W = J2 (V"*J( 6 > V i) + ^ (0, VJ) N /Var([^)]2)« j + au j+k ) . 
We derive from the definition of W and the previous Berry-Esseen inequality that 



sup 

x>0 



nA k „ [T n 9 + A n i 



> x - P(VF > x) 



< 



C 



log 7 (n) 



Conditionally to (ui, . . . , u k ), W/a 2 follows a non-central x 2 distribution with k degrees 
of freedom and non-centrality parameter 



V ■■= E (V">*(0> V i) + y/^M Vjy/vzrWWu,) /a 2 . 

j=l V / 

By a deviation inequality on non-central \ 2 distributions (e.g. Eq.18 in [3]), we derive 
that, conditionally to (tti, . . . , u k ), 



W>ka 2 + -V<7 2 - 2cr 2 Jk\og(2/f3) - 10a 2 log(2//3) , 
5 

with probability larger than 1 — /3/2. The non-centrality parameter V is a polynomial func- 
tion of independent normal variables. Applying a deviation inequality for normal variables, 
we derive that V > n/4\\rl /2 6\\ 2 /a 2 with probability larger than l-£*=i exp [-raVar([r/-?)] 2 )/8] . 
All in all, we conclude that 

| V^A k , n (f n 9 + A 2 ) || 2 > ka 2 + ^||i^ /2 0|| 2 - 2a 2 Vfclog(2//3) - 10a 2 log(2//3) , 

with probability larger than 1 — j3/2 — C/log 7 (n) — nexp[— C'n]. 

□ 



Appendix D: Remaining proofs based on perturbation theory 
D.l. Proof of Lemma 10.2 

The second bound straightforwardly follows from the first bound by Markov inequality. 
Fix z € Bj. We have 

2 

HS 



(zi-T)- 1/2 (fn-r) (zi-ry 1/2 

-j-OO -j~oo 

EE(( Z /-rr 1/2 (f n -r) (zi-r)-^ 



v h v k ) = 



1=1 k =l 



E 

l,k=l 



+E2 ([T n -T)Vi,V k 



\z — A/I \z — Ai 



Since for z = Aj + ^e' 9 € 6 3 and i ^ j 



\z-\i\ 



A, - Ai + 



>|A i -A i |-^->|A j -A,|/2, 
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we have 

+00 

E 

l,k=l 



f n -r)v h v k ) 2 +2? ((f„-r)^,Vfe\ 2 +2^ ((f n -r)v 3 ,v k ^ 



\z - Ail \z - \ k 



< 



4E 



, fc =i, |A,-A ; ||A,-A fe | ; ( 

1 - n^-;' 



<5j |Aj - A fc | 



3 



Applying Assumption B.l, we derive 

2- 



E 



E lz _ Xl \\z-X k \ 



C 

< — 
n 



AfcA; 



A| 



+00 

E 



+00 

E 



< 



a 



E 



A* 



Aj 



fc>l, k^j 

Applying Lemma 10.1 and Assumption B.2 allows us to conclude. 



\X k -Xj\ \Xj-X i+ i\ 2 IAj-x-A^ 



D.2. Proof of Lemma 9.1 

For any 2 < j < k n , we define 8j := max(Aj — Aj+i, Aj_i — Aj). Then, we build an oriented 
circle Bj on the complex plane of radius (5'j — <5j)/4 in such a way that any real number 
between (Aj + A J+ i)/2 and (Xj + Aj_i)/2 is either inside Bj or Bj. See Figure 2 for an 
example of Bj and Bj . 




Lemma 9.1 is a straightforward consequence of the two following lemmas. Let us define 

T n (z) = (z/-r)- 1 /2 ( r„_r)(z/-r)- 1 /2 an ds„(z) = {zi-T)y 2 (zi-f n )-\zi-Y)V\ 



Lemma D.l. We have A n C £ n 



UfiU {Al > ^^j; 



where 



£n ■= { SUp SUp 11^(2)1^ > 0.5 

I i<j<k n zeBj I 



£' n ~l sup sup||T„(z)|| oo >0.5 

\2<j<k n z£B' j J 



Lemma D.2. Under Assumptions B.l and B.2, we have 



P(£n)<Ci( 7 ) 



fc3 log 2 (fc„ Ve) 



p(0<c 2 ( 7 ) 



fc3 log 2 (fc„ Ve) 



Ai > 



3Ai - A 2 



< 



C 3 (7) 
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Proof of Lemma D.l. Suppose that the four following events hold: 1) T n has no eigenvalue 
on all the contours Bj and By 2) For each 1 < j < k n , T n has exactly one eigenvalue 
inside the circle Bj. 3) For each 2 < j < k n , T n has no eigenvalue inside the circle B'y 4) 

Ai < (3Ai — Aa)/2. In such a case, the event A n is true. As a consequence, A n is included 
in the union of the four following events denoted T>i, 2? 2 , D3 and D4. 

• For some 1 < j ' < k n , T n has an eigenvalue that lies on the contours Bj and By 

• For some 1 < j < k n , T n has either or more than 2 eigenvalues inside the circle 

By 

• For some 2 < j < k n , T n has at least 1 eigenvalue inside the circle By 

• Aa > (3Ai - A 2 )/2. 

We shall prove that V x C £„ U S' n , that V 2 \V 1 C £ n and that X> 3 \ V x C 

Event 2?i . Assume that an eigenvalue of r n lies exactly on some contour Bj U B'j . Let us 
call A such an eigenvalue and V a corresponding eigenvector. We have 

T n @)$i-r)V 2 v = (\i-r)-v 2 (? n -r)v 

= (XI -T)- 1/2 (\I -T)V = (XI -T) 1/2 V . 



Since A is not an eigenvalue of T, we have (A/— T) X / 2 V 7^ so that sup 26BjUB / 
1. Hence, V x c£ n US' 



\Tn(z)\[ 



> 



Event T>2 \ T>\. Assume that T>% \ T>\ is true. It follows that for some 1 < j* < k n 
the operator (2'kl)^ 1 J B ^ (zl — T n )~ 1 dz is an orthogonal projector tt^ ^ on a space Wj* 
of dimension different from one. In contrast, (2ttl)~ 1 f B t (zl — T)~ 1 dz is the orthogonal 
projector -Kj* on Vj*. Consider 



1 

2~7Ti 



zi - r„ - (zi - r) 



dz — 



If dim(Wj. 



0, then II 7r 



1. If dim(W,*) > 2, then there exists a vector 



V in Wj* such that 7Tj*y = 0. As a consequence, we have \\ir 



> 1. For any 



z € Bj*, S n (z) is well defined since no eigenvalue of T„ lies on Bj*. It follows that 



1 < 



< 



< 



1 

2^ 
1 

2^ 
1 

2^ 



J b (zi - f n ) 1 (f „ - r) (zi - r) 1 



zl - T r , 



(zI-T) 1/2 T n (z) (zI-T) 



-1/2 



dz 



(zI-T) 1/2 Izl-T n ) (zI-T) 



a/2 



||T n (*)!!«, (zi-r) 



-1/2 



< sup ||S n (z)|L||r n (z)|| 

zGB,» 



dz 



(D.l) 



since || (z/ — T) 1 Hoc < 2/5 j. Moreover, we have S n (z) (I — T n (z)) — I. We can as- 
sume that sup z6Bj „ \\T n (z)!!^ < 0.9, otherwise £ n is true. Then, we have \\S n (^)|| oc < 
(1 - \\T n (z)^)- 1 . Gathering this bound with (D.l) leads to sup zeB „ \\T n (z)^ > 0.5, 
which allows us to conclude that T>i \ V 1 C £ n ■ 
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Event D 3 \T>i. Assume that T> 3 \ T>i is true. Arguing as for 2? 2 , we derive that for some 
2 < j* <k n , we have 5'j» > Sj* and 



1 

2tt 



zI-T, 



r n -r) {zi-vy 1 dz 



> i 



(D.2) 



We have proved above that 

(zi - rj-^Tn - r)(zi - r)- 1 = (zi - r)- 1 / 2 5 n (z)r„(z)(z/ - r)-^ 2 dz , 

where S n (z) = (I — T„(x)) _1 is well defined for any z € B'j». By a straightforward 
induction, we get for any positive integer p 

J (zI-Tn)' 1 (T n -T)(zI-Ty 1 dz = J2j Bi (zI-T)-^ 2 T*(z)(zI-r)-^ 2 dz 



k=l ' 



+ / (zi-r)- 1 / 2 s n {z)i*(z){zi-r)-v 2 dz 



Observe that each integral j B , [zi — T) x / 2 T^{z){zI — T) 1 / 2 dz is zero since the operator 

(zi — r) -1 / 2 has no pole inside Bj,. Assume that the event £' n does not hold. Then, we 
can bound ||Sti(z)||oo by (1 — HTn^Hoo) as above. As a consequence, we obtain that 
for any positive integer p , 



1 

2^ 



zi -r, 



r n -r) {zi-vy 1 dz 



< 



2n 



{zi-ry 1/2 



2 \\T n (z)\\^ 
oo 1 - \\T n (z) 



-dz < 



2PS 4 



Takingp large enough in this last upper bound contradicts (D.2). Thus, (V 3 \ T>i)Pi£ n = 0, 
which allows us to conclude. □ 

Proof of Lemma D.2. The first bound is a straightforward consequence of Lemma 10.2 
since £ n = Uj£tc n £j tTl . The second bound proceeds from the same approach as Lemma 
10.2. 

Let us turn to the third bound. By Weyl's theorem, (e.g. Theorem 4.3.1 in [26]), we 
have | Ai — Ai| < ||r„ — so that 



Ai > 



We have 



3Ai - A 2 



< 



< 



|Ai — Ai| > 



Ai — A 2 



< 



rn-riUrs > 



2 

Ai — A2 



Ai — A2 



< 



(Ar - A 2 )2 



E 



|r« — r\\ HS 



E 



00 

Y, E[((f„-r)^,v/ i ) i 



k.l=l 



c ^ x 



\k = l 



by Assumption B.l. By Assumption B.2, 2A 2 < A\. Applying Lemma 10.1, we get 



Ai > 



3Ai - A 2 



< 



n \ Ai 



□ 



Hilgert et al./ Functional tests 



37 



Appendix E: Proofs of technical details 

Proof of Lemma 9.1 We have ||Y - fifeY^ = \\ Y \\l ~ ||IIfeY||^. By the Central limit 
Theorem, the classical Berry-Esseen inequality, and a classical deviation inequality of \ 2 
random variables (e.g. Lemma 1 in [30]), we get 



log(l/x) 2 \og(l/x) 



<2x + C 



E(|e|< 



(E.l) 



for any x > 0. Let us compute the expectation of ||IIfcY| 



E 



in,Y|| 



= E 



r 2 E 



E{||n fe Y|| 2 |x} 

< a 2 k 



= E 



E 



tr[Y*n fc Y]|X 



tr 



Applying Markov inequality to ||IIfcY||^ and gathering this deviation inequality with 
(E.l), we conclude that 



> fclog (n) | g /log log n 



< 



C 



\og 2 {n) Vn 



uniformly over all k < k n . 



□ 



Details of the proof of Theorem 6.1. Here, we provide some details on the comparison 
between the lower bound (29) and the quantile (30). By Assumption B'.3, we derive 
that 0a_,(Y,X) — k^ n _ k (a/\K n \) is positive with probability larger than 1 — 3/3/4 — 

C( 7 )log- 1 (n) if 



C[n\\Tl /2 e\\ 2 - C' 2 a 2 (7fclog(l//3) + log(l//3)) - 



^_ll (r i/2_ r V2 )e ||2 



log(n 



'fclog 



\K„ 



+%|r 1/2 0|| 2 



k V log- 



log 



|/C n | 



n V n 



Since log(|/C„|/a) < k < n 1 / 4 and /3 > C(j)/ log(n), we derive that for n larger 

than a numerical quantity, <^fc(Y, X) — k (a/\K, n \) is positive with probability larger 

than 1 - 3/3/4 - C( 7 ) log _1 (?i) if 



lir 1 / 2 ^! 2 > Ci || (r 1/2 - r k 



V2\/)||2 , 2^*2 



'fclog 



\fCn\ 



log 



\IC n \ 



1a 



□ 



Proof of Lemma 9.9. We have shown in the proof of Lemma 9.3 that 



E 



y^i (l fe - A fc ) A n> i 



<C(7) 



"fc3log 2 (n) 
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Gathering this bound with Markov inequality and Assumptions B'.3 allows us to derive 
the second lower bound of Lemma 9.9. Focusing on the first bound, we shall prove the 
following stronger result. For any x > 0, k < k n and any n > 1, 



fo(A k - A k )T n 6\\ > xj < ¥[A n ] 

+c(7) !^|W||rV^|| 2 



a 



log(n) 



fc 3 log 2 (fcVe) _k_ fc» /2 log(fc w V e) 
n -v/n n 



.(E.2) 



If we take x = HF 1 / 2 ^ y / n/(fc 1 / 4 log(n)) in this inequality and if we combine it with 
Lemma 9.1 and Assumption B'.3, we recover the conclusion of Lemma 9.9. 



Define the event U n := {HT, 1 / 2 ^ 2 > log^Hr 1 / 2 ^ 2 }. Since 



E 



n 



= l|r 1/2 tf|| 2 , 



we derive 



P [W„] < 



log(n) 



(E.3) 



We bound V[\\(A k - A k )T n 9\\ > x] as follows 



(A k -A k ) r, 



< 



> X 



-E 



< 



{IK 



A fc - A fc r 



a & - A fe r„ 



L U n nA„ 



> x\ n w„ n A 
+ p [w„ u A.] 



1 [w n u A 



< — ^-E 



A k A k ) f !/ 2 



fy 2 ^ 



As a consequence, we have to investigate 



A k -A k ]Y x J 2 



L u n nA„ 



HS 



+ P [U n U A n ] 

+ p [w„ u .A„] 



(4-A.)!V|„-tr( 



= tr ( A k - A fc ] T„ ( A k - A k 



= tiA k T n A k - tiA k T n A k - tiA k T n A k + tr A k T n A k . 
Arguing as in the proof of Lemma 9.3, we take the expectation 



E 



A k -A k )f 1 J 2 



E 



trllfcl^ 
2E ' 



HS 

E 



[trna x J + E [ tr (M^n - r)A fc ) i : 



-1/2 



< 2E 

< 2E 



k - trPyfr 



l/2 p -l/2 



fe fc 



}^„] + \/ E [ tr2 ( 



tr 2 f A fc (r„ - r)A fc )l v^Rn 



{ 

*• { r * 1/2 ( r i /2 - P,Y, 2 ) } is.] + y/E[tr»(A t (f„-rM t ) , — 
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These expectations have already been upper bounded in (27) and (28). Thus, we derive 

-2/ tw ^ „ £5/2, 



E 



-4„ 



<C( 7 ) 



fc 3 log 2 (fcVe) v _^ v fc » " lQ g( fc " V e ) 
n \/n n 



Gathering this last bound with (E.3) and (E.4) allows us derive the desired inequality 
(E.2). □ 

Proof of Lemma 9.10. Observe that ||Y - U k Y\\l < \\Y\\ 2 n = ||e|| 2 + 2 £™ =1 e^, 6») + 
5Z^ =1 (X i , 9) 2 . By Assumption B.l and Tchebychev inequality, ||e||£ < a 2 {n+Cy/n log(ra)) 
with probability larger than 1— 1/ log(n). Tchebychev inequality also tells us that Y2i=i £ i(Xi, ^) — 
-\/nlog(rt)((T 2 + ||r 1 / 2 6'|| 2 ) with probability larger than 1— 1/ log(n). Furthermore, we apply 
Markov inequality to derive that S"=i( X i>#) 2 < 4n||r 1 / 2 6>|| 2 / ; 3 with probability larger 
than 1 — /3/4. Since fc < n/2, we conclude that 



n h Y|| 



- fc 



l + C - 
\ n 



log(n) 



c"||r 1/2 6>|| 2 /^ 



with probability larger than 1 — 2/log(n) — /3/4. 



□ 



Proof of Lemma 9.5. Define t = 8 ^ log l ° s " + k l ° 6 n (n) + k ^ (n) . First, we use the following 
bound that will be proved at the end of the proof: 



As a consequence, we have 



Xk 



|/c„| 



|/C„ 



< Xfc 



/ 1+4 



log(n) 



(E.4) 



log(") 



Since for any < it < 1 and any integer fc > 1, we have Xk 1 ( u ) — ^ + 2-\/log(l/M)fc + 
21og(l/w) (e.g. Lemma 1 in [30]), it follows from Assumption B'.3 that 



(i-*)x fe _1 (pe 



> 



> 



Xk 



Xk 





_ 1_ 




n 


log(n) 




n 




a 


i 

h - 


\Kn\ 


n 


a 


1 

h - 


\IC n \ 





(7(a) [fc V log(n)] 

g(a) 
log(ra) 



log(n) fclog (ri) 1 
n n fclog (n) 



Let us note (x) the density at a; of a \ 2 random variable with fc degrees of freedom. 
Consider some positive numbers x and u such that x > u. 



Xk(x - u) 
Xk(x) 



(x-U< X 2 (k) <x) | SUPte[x-u;x] fxjM n/2 fxk ("0 

Xk{x) Xk{x) Xk{x) 



since f Xk (x) — x k l 2 x e x l 2 /[2 fe / 2 r(fc/2)]. By integration by part, one observes that 
fxki x )/Xk(x) < 1/2. As a consequence, we have Xk(x — u) < Xk{ x )[l + uj2e u l 2 \ for 
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any u < x. This upper bound also holds when u > x 

C{a) 



Xk 



a + 1 
|/C„| nj log(n) 



< 



< 







' a 1 




X? [ 


v /C„| n 


a 


i0- 


f C 3 (a)\ 


\JC n 


log(n)/ 



g 2 (a) 
log(ra) 



which allows us to derive the desired result. 

To finish the proof, we need to prove (E.4). Let Xk and X n _k respectively denote two 
independent random variables that follow a \ 2 distribution with fc and n — fc degrees 
of freedom. Moreover we define Fk^ n -k as Xk(n — k)/(X n -kk). Since Xfc 1 ^) < k + 
2^1og(l/u)fc + 21og(l/u), we have 



\IC n \ 



Xk > Xk 1 



a 1 



< 



< 



kF, 



k.n—k 



> 



\K. n \ n 

te) 



1 

n 



kF, 



k,n—k 



> 



n — k 



1 

n 



1 + 4 



log(") 



ince 4-\/log(n)/n > 2^/log(n)/(n — fc) + 2 log(n)/(n— k) for fc < n/2 and n large enough. 

for fc < n/2 and 

□ 



since ■ 

We conclude that kT^ n _ k (a/\K n \) > xl' (p^l + s) / ( X + 4 /°l 
n large enough. 
Proof of Lemma 9.6. Arguing as above, we get 



Applying the inequality Xk ( u ) — ^ + 2-\/log(l/u)fc + 21og(l/u) for any < u < 1 and 
Condition B.3, we get 



Xk 



kO-~t)Kn-k («) 



< 



Xfc 



x fc 



1 - < 



n/ 1 + 40og(n)/n 



< Xk 

< ail 



Xk' 



1 



n J log(n) 



C(a) 
log(n) 

where we have used Xk{x — u) < Xk{ x )W + u/2e u / 2 ] in the last inequality. 



□ 



Proof of Lemma 10.1. Since (jAj) jeN is a decreasing sequence j\j > k\k for fc > j. 
Hence, we get 

fe-i . fe-i fe-i 
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Similarly Y^tk+i X il ( A fe _ X i) ^ fc Ej=fc+iO' - fc ) 1 = fc Ej=i J Now we focus on 
Sj>2fe+i Aj/ ( Afe — A?). The assumption on the eigenvalues implies that for j > k, 



fclog 1+7 (fcV2) (A fc - Aj) > (jlog 1+7 j - fclog 1+7 (fc V 2)) A, . 



Thus, we get 



and 



Afc — Aj 

E 



< [(jlog 1+7 j/fclog 1+7 (fcV2))-l]- 



i>2fe+i 
For a; > 2k, we have 



Afc — Aj 



< 



2 k 



■ log 1+7 X 



fclog 1+7 (fcV2) 



dx . 



x\og 1+ ~< x > 2fclog 1+7 2fc >2 



so that 

It follows that 

f +oo 
2fc 



Hog 1+7 (fcV2) " fclog 1+7 (fcV2) 



x log 1+7 x 1 x log 1+7 X 

- 1 > 



Hog 1+7 (/cV2) ^ ~ 2fclog 1+7 (fcV2) 
xlog 1+7 x 



fclog 1+7 (fcV2) 



- 1 



dx < 2fclog 1+7 (fcV2) f 

J2 



2, xlog 1+7 x 



2fclog 1+7 (fcV 2) 2fclog(/cV2) 



7 log 7 2fc 



7 



All in all, we conclude that 



E 



I A fe - I 



< 



2fclog(/cV2) 



fe 

+ 2 E^ 



□ 



